The Importance of Informative Feature Representations

نویسنده

  • Ricardo Vilalta
چکیده

Typical issues under consideration when selecting or designing a classification algorithm are the bias and variance components of error induced by the algorithm [1]. For example, one may choose a simple algorithm (e.g., linear combination of feature values, Naive Bayes, single logical rules, etc.) and draw a hypothesis from a small family of functions; the poor repertoire of functions may produce high bias (the best function may be far from the target function) but low variance (because of the sensitivity on local data irregularities). The alternative is to increase the degree of complexity by drawing a hypothesis from a large class of functions (e.g., neural networks with a large number of hidden units); here the hypothesis exhibits flexible decision boundaries (low bias) but becomes sensitive to small variations in the data (high variance). A less explored –but perhaps more critical issue– is that of the feature representation, which can be the cause of a third component of error known as Bayes (irreducible) error. This occurs when the feature representation leads to class overlap. While bias and variance can be traded off by varying the classification strategy, Bayes error remains immutable as soon as the feature representation is fixed. The importance of high quality features is crucial to attain accurate predictions and cannot be over-emphasized [2]. High quality features convey much information about the problem; in this case, even a simple hypothesis suffices to produce good results. In contrast, low quality features complicate the classification process. Features can bear poor correlation with the class, or interact in many ways, which calls for additional steps to discover important feature combinations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

Topographies of Hate: Islamophobia in Cyberia

Islamophobia’s occurrence in any particular country has little do with the presence of Muslim; it is possible to be Islamophobic when there are virtually no Muslim around. This because the lack of Muslims is filled by the surplus of Islamophobic representations. This surplus of representations is now increasingly reliant on the internet. There are many studies reporting on Islamophobia on the i...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Permutation importance: a corrected feature importance measure

MOTIVATION In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support ...

متن کامل

Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning

This paper proposes a novel approach for effectively utilizing unsupervised data in addition to supervised data for supervised learning. We use unsupervised data to generate informative ‘condensed feature representations’ from the original feature set used in supervised NLP systems. The main contribution of our method is that it can offer dense and low-dimensional feature spaces for NLP tasks w...

متن کامل

EMG-based wrist gesture recognition using a convolutional neural network

Background: Deep learning has revolutionized artificial intelligence and has transformed many fields. It allows processing high-dimensional data (such as signals or images) without the need for feature engineering. The aim of this research is to develop a deep learning-based system to decode motor intent from electromyogram (EMG) signals. Methods: A myoelectric system based on convolutional ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011